{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Torch basics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "import torch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Largely inspired from the tutorial [What is PyTorch?](https://pytorch.org/tutorials/beginner/former_torchies/tensor_tutorial.html)\n",
    "\n",
    "Tensors are used to encode the signal to process, but also the internal states and parameters of models.\n",
    "\n",
    "**Manipulating data through this constrained structure allows to use CPUs and GPUs at peak performance.**\n",
    "\n",
    "Construct a 3x5 matrix, uninitialized:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "torch.set_default_tensor_type('torch.FloatTensor')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = torch.empty(3,5)\n",
    "print(x.type())\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you got an error this [stackoverflow link](https://stackoverflow.com/questions/50617917/overflow-when-unpacking-long-pytorch) might be useful..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = torch.randn(3,5)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "torch.Size is in fact a [tuple](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences), so it supports the same operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.size()[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.size() == (3,5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bridge to numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = x.numpy()\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "a = np.ones(5)\n",
    "b = torch.from_numpy(a)\n",
    "c = torch.from_numpy(a)\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xr = torch.randn(3, 5)\n",
    "print(xr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xr + b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# solve this bug!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xr+b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x+xr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.add_(xr)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Any operation that mutates a tensor in-place is post-fixed with an ```_```\n",
    "\n",
    "For example: ```x.copy_(y)```, ```x.t_()```, will change ```x```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x.t())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.t_()\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also be careful, changing the torch tensor modify the numpy array and vice-versa..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.add(a, 1, out=a)\n",
    "print(b)\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.cuda.is_available()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "device = torch.device('cpu')\n",
    "# device = torch.device('cuda') # Uncomment this to run on GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.device"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# let us run this cell only if CUDA is available\n",
    "# We will use ``torch.device`` objects to move tensors in and out of GPU\n",
    "if torch.cuda.is_available():\n",
    "    device = torch.device(\"cuda\")          # a CUDA device object\n",
    "    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU\n",
    "    x = x.to(device)                       # or just use strings ``.to(\"cuda\")``\n",
    "    z = x + y\n",
    "    print(z)\n",
    "    print(z.to(\"cpu\", torch.double))       # ``.to`` can also change dtype together!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x = torch.randn(1)\n",
    "x = x.to(device)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the following line is only useful if CUDA is available\n",
    "x = x.data\n",
    "print(x)\n",
    "print(x.item())\n",
    "print(x.numpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Simple interfaces to standard image data-bases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchvision\n",
    "\n",
    "data_dir = '/home/lelarge/data/'\n",
    "\n",
    "cifar = torchvision.datasets.CIFAR10(data_dir, train = True, download = True)\n",
    "x = torch.from_numpy(cifar.train_data).transpose(1, 3).transpose(2, 3).float()\n",
    "x = x / 255\n",
    "print(x.type(), x.size(), x.min().item(), x.max().item())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Narrows to the first images, converts to float\n",
    "x = x.narrow(0, 0, 48).float()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Showing images\n",
    "def show(img):\n",
    "    npimg = img.numpy()\n",
    "    plt.figure(figsize=(20,10))\n",
    "    plt.imshow(np.transpose(npimg, (1,2,0)), interpolation='nearest')\n",
    "    \n",
    "show(torchvision.utils.make_grid(x, nrow = 12))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Kills the green and blue channels\n",
    "x.narrow(1, 1, 2).fill_(0)\n",
    "show(torchvision.utils.make_grid(x, nrow = 12))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Autograd: automatic differentiation\n",
    "\n",
    "When executing tensor operations, PyTorch can automatically construct on-the-fly the graph of operations to compute the gradient of any quantity with respect to any tensor involved."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = torch.ones(2, 2)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A Tensor has a Boolean field *requires_grad*, set to False by default, which states if PyTorch should build the graph of operations so that gradients wrt to it can be computed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.requires_grad"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.requires_grad_(True)\n",
    "x.requires_grad"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.detach().numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.data.numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.requires_grad"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = x + 2\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Broadcasting](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html) again!\n",
    "\n",
    "Broadcasting automagically expands dimensions by replicating coefficients, when it is necessary to perform operations.\n",
    "\n",
    "1. If one of the tensors has fewer dimensions than the other, it is reshaped by adding as many dimensions of size 1 as necessary in the front; then\n",
    "2. for every mismatch, if one of the two tensor is of size one, it is expanded along this axis by replicating  coefficients.\n",
    "\n",
    "If there is a tensor size mismatch for one of the dimension and neither of them is one, the operation fails."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = torch.tensor([[1.], [2.], [3.], [4.]])\n",
    "print(A.size())\n",
    "B = torch.tensor([[5., -5., 5., -5., 5.]])\n",
    "print(B.size())\n",
    "C = A + B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "C"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Back to Autograd!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y.requires_grad"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = y * y * 3\n",
    "out = z.mean()\n",
    "\n",
    "print(z, out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the computation is finished, i.e. _forward pass_, you can call ```.backward()``` and have all the gradients computed automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "out.backward()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The gradients w.r.t. this variable is accumulated into ```.grad```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s call the ``out``\n",
    "*Variable* “$o$”.\n",
    "We have that:\n",
    "\n",
    "$y_i = x_i+2$\n",
    "\n",
    "$z_i = 3 y_i^2$ \n",
    "\n",
    "$o = \\frac{1}{4}\\sum_i z_i$ \n",
    "\n",
    "**Forward pass:**\n",
    "\n",
    "$y_i\\bigr\\rvert_{x_i=1} = 3$\n",
    "\n",
    "$z_i\\bigr\\rvert_{y_i=3} = 27$\n",
    "\n",
    "$o\\bigr\\rvert_{z_i=27} = 27$.\n",
    "\n",
    "Taking partial derivatives give:\n",
    "\n",
    "$\\frac{\\partial o}{\\partial z_i} = \\frac{1}{4}$\n",
    "\n",
    "$\\frac{\\partial z_i}{\\partial y_i} = 6 y_i$\n",
    "\n",
    "$\\frac{\\partial y_i}{\\partial x_i} =1$\n",
    "\n",
    "\n",
    "hence by the **chain-rule:**\n",
    "\n",
    "$\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=1} = \\frac{\\partial o}{\\partial z_i}\\bigr\\rvert_{z_i=27}\\frac{\\partial z_i}{\\partial y_i}\\bigr\\rvert_{y_i=3}\\frac{\\partial y_i}{\\partial x_i}\\bigr\\rvert_{x_i=1} = \\frac{1}{4} * 18 * 1 = 4.5$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(y.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Why cant I see .grad of an intermediate variable?](https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out.backward(torch.Tensor([2.0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x = torch.ones(2, 2)\n",
    "x.requires_grad_(True)\n",
    "y = x+2\n",
    "z = 3 * y ** 2 \n",
    "out = z.mean()\n",
    "\n",
    "out.backward(retain_graph=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.autograd.grad(out, z, retain_graph=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.autograd.grad(out, y, retain_graph=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out.backward(retain_graph=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "out.backward(torch.Tensor([2.0]), retain_graph=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Manually zero the gradients after updating weights\n",
    "x.grad.data.zero_()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The gradients must be set to zero manually. Otherwise they will cumulate across several _.backward()_ calls. \n",
    "This accumulating behavior is desirable in particular to compute the gradient of a loss summed over several “mini-batches,” or the gradient of a sum of losses.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "out.backward()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to come back to the difference between detach and data see [Differences between .data and .detach](https://github.com/pytorch/pytorch/issues/6990)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Playing with pytorch: linear regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Warm-up: Linear regression with numpy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our model is:\n",
    "$$\n",
    "y_t = 2x^1_t-3x^2_t+1, \\quad t\\in\\{1,\\dots,30\\}\n",
    "$$\n",
    "\n",
    "Our task is given the 'observations' $(x_t,y_t)_{t\\in\\{1,\\dots,30\\}}$ to recover the weights $w^1=2, w^2=-3$ and the bias $b = 1$.\n",
    "\n",
    "In order to do so, we will solve the following optimization problem:\n",
    "$$\n",
    "\\underset{w^1,w^2,b}{\\operatorname{argmin}} \\sum_{t=1}^{30} \\left(w^1x^1_t+w^2x^2_t+b-y_t\\right)^2\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from numpy.random import random\n",
    "# generate random input data\n",
    "x = random((30,2))\n",
    "\n",
    "# generate labels corresponding to input data x\n",
    "y = np.dot(x, [2., -3.]) + 1.\n",
    "w_source = np.array([2., -3.])\n",
    "b_source  = np.array([1.])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x[:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "\n",
    "def plot_figs(fig_num, elev, azim, x, y, weights, bias):\n",
    "    fig = plt.figure(fig_num, figsize=(4, 3))\n",
    "    plt.clf()\n",
    "    ax = Axes3D(fig, elev=elev, azim=azim)\n",
    "    ax.scatter(x[:, 0], x[:, 1], y)\n",
    "    ax.plot_surface(np.array([[0, 0], [1, 1]]),\n",
    "                    np.array([[0, 1], [0, 1]]),\n",
    "                    (np.dot(np.array([[0, 0, 1, 1],\n",
    "                                          [0, 1, 0, 1]]).T, weights) + bias).reshape((2, 2)),\n",
    "                    alpha=.5)\n",
    "    ax.set_xlabel('x_1')\n",
    "    ax.set_ylabel('x_2')\n",
    "    ax.set_zlabel('y')\n",
    "    \n",
    "def plot_views(x, y, w, b):\n",
    "    #Generate the different figures from different views\n",
    "    elev = 43.5\n",
    "    azim = -110\n",
    "    plot_figs(1, elev, azim, x, y, w, b[0])\n",
    "\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_views(x, y, w_source, b_source)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In vector form, we define:\n",
    "$$\n",
    "\\hat{y}_t = {\\bf w}^T{\\bf x}_t+b\n",
    "$$\n",
    "and we want to minimize the loss given by:\n",
    "$$\n",
    "loss = \\sum_t\\underbrace{\\left(\\hat{y}_t-y_t \\right)^2}_{loss_t}.\n",
    "$$\n",
    "\n",
    "To minimize the loss we first compute the gradient of each $loss_t$:\n",
    "\\begin{eqnarray*}\n",
    "\\frac{\\partial{loss_t}}{\\partial w^1} &=& 2x^1_t\\left({\\bf w}^T{\\bf x}_t+b-y_t \\right)\\\\\n",
    "\\frac{\\partial{loss_t}}{\\partial w^2} &=& 2x^2_t\\left({\\bf w}^T{\\bf x}_t+b-y_t \\right)\\\\\n",
    "\\frac{\\partial{loss_t}}{\\partial b} &=& 2\\left({\\bf w}^T{\\bf x}_t+b-y_t \\right)\n",
    "\\end{eqnarray*}\n",
    "\n",
    "For one epoch, **Stochastic Gradient Descent with minibatches of size 1** then updates the weigts and bias by running the following loop: \n",
    "\n",
    "for $t \\in \\{1,\\dots,30\\}$, \n",
    "\n",
    "\\begin{eqnarray*}\n",
    "w^1_{t+1}&=&w^1_{t}-\\alpha\\frac{\\partial{loss_t}}{\\partial w^1} \\\\\n",
    "w^2_{t+1}&=&w^2_{t}-\\alpha\\frac{\\partial{loss_t}}{\\partial w^2} \\\\\n",
    "b_{t+1}&=&b_{t}-\\alpha\\frac{\\partial{loss_t}}{\\partial b},\n",
    "\\end{eqnarray*}\n",
    "\n",
    "if $t = 30$, set $w^1_1=w^1_{31}$, $w^2_1 = w^2_{31}$ and $b_1=b_{31}$.\n",
    "\n",
    "$\\alpha>0$ is called the learning rate.\n",
    "\n",
    "Then we run several epochs..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# randomly initialize learnable weights and bias\n",
    "w_init = random(2)\n",
    "b_init = random(1)\n",
    "\n",
    "w = w_init\n",
    "b = b_init\n",
    "print(\"initial values of the parameters:\", w, b )\n",
    "\n",
    "\n",
    "# our model forward pass\n",
    "def forward(x):\n",
    "    return x.dot(w)+b\n",
    "\n",
    "# Loss function\n",
    "def loss(x, y):\n",
    "    y_pred = forward(x)\n",
    "    return (y_pred - y)**2 \n",
    "\n",
    "print(\"initial loss:\", np.sum([loss(x_val,y_val) for x_val, y_val in zip(x, y)]) )\n",
    "\n",
    "# compute gradient\n",
    "def gradient(x, y):  # d_loss/d_w, d_loss/d_c\n",
    "    return 2*(x.dot(w)+b - y)*x, 2 * (x.dot(w)+b - y)\n",
    " \n",
    "learning_rate = 1e-2\n",
    "# Training loop with minibatch (of size 1)\n",
    "for epoch in range(10):\n",
    "    l = 0\n",
    "    for x_val, y_val in zip(x, y):\n",
    "        grad_w, grad_b = gradient(x_val, y_val)\n",
    "        w = w - learning_rate * grad_w\n",
    "        b = b - learning_rate * grad_b\n",
    "        l += loss(x_val, y_val)\n",
    "\n",
    "    print(\"progress:\", \"epoch:\", epoch, \"loss\",l[0])\n",
    "\n",
    "# After training\n",
    "print(\"estimation of the parameters:\", w, b )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_views(x, y, w, b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the actual gradient of the loss is given by:\n",
    "$$\n",
    "\\frac{\\partial{loss}}{\\partial w^1} =\\sum_t \\frac{\\partial{loss_t}}{\\partial w^1},\\quad\n",
    "\\frac{\\partial{loss}}{\\partial w^2} =\\sum_t \\frac{\\partial{loss_t}}{\\partial w^2},\\quad\n",
    "\\frac{\\partial{loss}}{\\partial b} =\\sum_t \\frac{\\partial{loss_t}}{\\partial b}\n",
    "$$\n",
    "\n",
    "For one epoch, **(Batch) Gradient Descent** updates the weights and bias as follows:\n",
    "\\begin{eqnarray*}\n",
    "w^1_{new}&=&w^1_{old}-\\alpha\\frac{\\partial{loss}}{\\partial w^1} \\\\\n",
    "w^2_{new}&=&w^2_{old}-\\alpha\\frac{\\partial{loss}}{\\partial w^2} \\\\\n",
    "b_{new}&=&b_{old}-\\alpha\\frac{\\partial{loss}}{\\partial b},\n",
    "\\end{eqnarray*}\n",
    "\n",
    "and then we run several epochs.\n",
    "\n",
    "Exercice: explain the difference between the 2 schemes?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "w = w_init\n",
    "b = b_init\n",
    "print(\"initial values of the parameters:\", w, b )\n",
    "\n",
    "learning_rate = 1e-2\n",
    "# Training loop\n",
    "for epoch in range(10):\n",
    "    grad_w = np.array([0,0])\n",
    "    grad_b = np.array(0)\n",
    "    l = 0\n",
    "    for x_val, y_val in zip(x, y):\n",
    "        grad_w = np.add(grad_w,gradient(x_val, y_val)[0])\n",
    "        grad_b = np.add(grad_b,gradient(x_val, y_val)[1])\n",
    "        l += loss(x_val, y_val)\n",
    "    w = w - learning_rate * grad_w\n",
    "    b = b - learning_rate * grad_b\n",
    "    print(\"progress:\", \"epoch:\", epoch, \"loss\",l[0])\n",
    "\n",
    "# After training\n",
    "print(\"estimation of the parameters:\", w, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_views(x, y, w, b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear regression with tensors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dtype = torch.FloatTensor\n",
    "# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x_t = torch.from_numpy(x).type(dtype)\n",
    "y_t = torch.from_numpy(y).type(dtype).unsqueeze(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is an implementation of **(Batch) Gradient Descent** with tensors.\n",
    "\n",
    "Note that in the main loop, the functions loss_t and gradient_t are always called with the same inputs: they can easily be incorporated into the loop (we'll do that below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "w_init_t = torch.from_numpy(w_init).type(dtype)\n",
    "b_init_t = torch.from_numpy(b_init).type(dtype)\n",
    "\n",
    "w_t = w_init_t.clone()\n",
    "w_t.unsqueeze_(1)\n",
    "b_t = b_init_t.clone()\n",
    "b_t.unsqueeze_(1)\n",
    "print(\"initial values of the parameters:\", w_t, b_t )\n",
    "\n",
    "# our model forward pass\n",
    "def forward_t(x):\n",
    "    return x.mm(w_t)+b_t\n",
    "\n",
    "# Loss function\n",
    "def loss_t(x, y):\n",
    "    y_pred = forward_t(x)\n",
    "    return (y_pred - y).pow(2).sum()\n",
    "\n",
    "# compute gradient\n",
    "def gradient_t(x, y):  # d_loss/d_w, d_loss/d_c\n",
    "    return 2*torch.mm(torch.t(x),x.mm(w_t)+b_t - y), 2 * (x.mm(w_t)+b_t - y).sum()\n",
    "\n",
    "learning_rate = 1e-2\n",
    "for epoch in range(10):\n",
    "    l_t = loss_t(x_t,y_t)\n",
    "    grad_w, grad_b = gradient_t(x_t,y_t)\n",
    "    w_t = w_t-learning_rate*grad_w\n",
    "    b_t = b_t-learning_rate*grad_b\n",
    "    print(\"progress:\", \"epoch:\", epoch, \"loss\",l_t)\n",
    "\n",
    "# After training\n",
    "print(\"estimation of the parameters:\", w_t, b_t )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_views(x, y, w_t.numpy(), b_t.numpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear regression with Autograd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Setting requires_grad=True indicates that we want to compute gradients with\n",
    "# respect to these Tensors during the backward pass.\n",
    "w_v = w_init_t.clone().unsqueeze(1)\n",
    "w_v.requires_grad_(True)\n",
    "b_v = b_init_t.clone().unsqueeze(1)\n",
    "b_v.requires_grad_(True)\n",
    "print(\"initial values of the parameters:\", w_v.data, b_v.data )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An implementation of **(Batch) Gradient Descent** without computing explicitly the gradient and using autograd instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for epoch in range(10):\n",
    "    y_pred = x_t.mm(w_v)+b_v\n",
    "    loss = (y_pred - y_t).pow(2).sum()\n",
    "    \n",
    "    # Use autograd to compute the backward pass. This call will compute the\n",
    "    # gradient of loss with respect to all Variables with requires_grad=True.\n",
    "    # After this call w.grad and b.grad will be Variables holding the gradient\n",
    "    # of the loss with respect to w and b respectively.\n",
    "    loss.backward()\n",
    "    \n",
    "    # Update weights using gradient descent. For this step we just want to mutate\n",
    "    # the values of w_v and b_v in-place; we don't want to build up a computational\n",
    "    # graph for the update steps, so we use the torch.no_grad() context manager\n",
    "    # to prevent PyTorch from building a computational graph for the updates\n",
    "    with torch.no_grad():\n",
    "        w_v -= learning_rate * w_v.grad\n",
    "        b_v -= learning_rate * b_v.grad\n",
    "    \n",
    "    # Manually zero the gradients after updating weights\n",
    "    # otherwise gradients will be acumulated after each .backward()\n",
    "    w_v.grad.zero_()\n",
    "    b_v.grad.zero_()\n",
    "    \n",
    "    print(\"progress:\", \"epoch:\", epoch, \"loss\",loss.data.item())\n",
    "\n",
    "# After training\n",
    "print(\"estimation of the parameters:\", w_v.data, b_v.data.t() )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_views(x, y, w_v.data.numpy(), b_v.data.numpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear regression with neural network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An implementation of **(Batch) Gradient Descent** using the nn package. Here we have a super simple model with only one layer and no activation function!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use the nn package to define our model as a sequence of layers. nn.Sequential\n",
    "# is a Module which contains other Modules, and applies them in sequence to\n",
    "# produce its output. Each Linear Module computes output from input using a\n",
    "# linear function, and holds internal Variables for its weight and bias.\n",
    "model = torch.nn.Sequential(\n",
    "    torch.nn.Linear(2, 1),\n",
    ")\n",
    "\n",
    "for m in model.children():\n",
    "    m.weight.data = w_init_t.clone().unsqueeze(0)\n",
    "    m.bias.data = b_init_t.clone()\n",
    "\n",
    "# The nn package also contains definitions of popular loss functions; in this\n",
    "# case we will use Mean Squared Error (MSE) as our loss function.\n",
    "loss_fn = torch.nn.MSELoss(size_average=False)\n",
    "\n",
    "# switch to train mode\n",
    "model.train()\n",
    "\n",
    "for epoch in range(10):\n",
    "    # Forward pass: compute predicted y by passing x to the model. Module objects\n",
    "    # override the __call__ operator so you can call them like functions. When\n",
    "    # doing so you pass a Variable of input data to the Module and it produces\n",
    "    # a Variable of output data.\n",
    "    y_pred = model(x_t)\n",
    "  \n",
    "    # Note this operation is equivalent to: pred = model.forward(x_v)\n",
    "\n",
    "    # Compute and print loss. We pass Variables containing the predicted and true\n",
    "    # values of y, and the loss function returns a Variable containing the\n",
    "    # loss.\n",
    "    loss = loss_fn(y_pred, y_t)\n",
    "\n",
    "    # Zero the gradients before running the backward pass.\n",
    "    model.zero_grad()\n",
    "\n",
    "    # Backward pass: compute gradient of the loss with respect to all the learnable\n",
    "    # parameters of the model. Internally, the parameters of each Module are stored\n",
    "    # in Variables with requires_grad=True, so this call will compute gradients for\n",
    "    # all learnable parameters in the model.\n",
    "    loss.backward()\n",
    "\n",
    "    # Update the weights using gradient descent. Each parameter is a Tensor, so\n",
    "    # we can access its data and gradients like we did before.\n",
    "    with torch.no_grad():\n",
    "        for param in model.parameters():\n",
    "            param.data -= learning_rate * param.grad\n",
    "        \n",
    "    print(\"progress:\", \"epoch:\", epoch, \"loss\",loss.data.item())\n",
    "\n",
    "# After training\n",
    "print(\"estimation of the parameters:\")\n",
    "for param in model.parameters():\n",
    "    print(param)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Last step, we use directly the optim package to update the weights and bias."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = torch.nn.Sequential(\n",
    "    torch.nn.Linear(2, 1),\n",
    ")\n",
    "\n",
    "for m in model.children():\n",
    "    m.weight.data = w_init_t.clone().unsqueeze(0)\n",
    "    m.bias.data = b_init_t.clone()\n",
    "\n",
    "loss_fn = torch.nn.MSELoss(size_average=False)\n",
    "\n",
    "model.train()\n",
    "\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)\n",
    "\n",
    "\n",
    "for epoch in range(10):\n",
    "    y_pred = model(x_t)\n",
    "    loss = loss_fn(y_pred, y_t)\n",
    "    print(\"progress:\", \"epoch:\", epoch, \"loss\",loss.item())\n",
    "    # Zero gradients, perform a backward pass, and update the weights.\n",
    "    optimizer.zero_grad()\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "    \n",
    "    \n",
    "# After training\n",
    "print(\"estimation of the parameters:\")\n",
    "for param in model.parameters():\n",
    "    print(param)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 1: Play with the code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Change the number of samples from 30 to 300. What happens? How to correct it?\n",
    "\n",
    "In the initialization phase, remove the .clone() What happens? Why?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 2: Logistic regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sigmoid function:\n",
    "$$\n",
    "\\sigma(y) = \\frac{1}{1+e^{-y}}\n",
    "$$\n",
    "\n",
    "The model is now\n",
    "$$\n",
    "Z_t = Ber(\\sigma(y_t)), \\quad t\\in\\{1,\\dots,30\\},\n",
    "$$\n",
    "and the task is still to recover the weights $w^1=2, w^2=-3$ and the bias $b = 1$ but now from the observations $(x_t,Z_t)_{t\\in \\{1,\\dots,30\\}}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.special import expit\n",
    "xaxis = np.arange(-10.0, 10.0, 0.1)\n",
    "plt.plot(xaxis,[expit(x) for x in xaxis]);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might need to install scipy first by runing:\n",
    "\n",
    "$ pip3 install scipy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from scipy.stats import bernoulli\n",
    "Z = bernoulli.rvs(expit(y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is an appropriate loss function now?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# solution!\n",
    "b = b.type('torch.FloatTensor')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:pytorch]",
   "language": "python",
   "name": "conda-env-pytorch-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
